Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement maps_column_view abstraction over LIST<STRUCT<K,V>> #10380

Merged
merged 26 commits into from
Mar 14, 2022

Conversation

mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Mar 2, 2022

Fixes #9109.

This commit adds a map abstraction over a column_view of type LIST<STRUCT<K,V>>, where K and V are key and value types. A list column of structs with two members may thus be viewed as a map column.

maps_column_view is to a LIST<STRUCT<K,V>> column what lists_column_view is to a LIST column.

The maps_column_view abstraction provides methods to fetch lists of keys and values (as LIST<K> and LIST<V> respectively). It also provides map lookup methods to find the values corresponding to a specified key, for each row in the "map" column.

E.g.

auto input_column = get_list_of_structs_col();
// input_column == [ {1:10, 2:20}, {1:100, 3:300}, {2:2000, 3:3000, 4:4000} ];

auto maps_view = cudf::jni::maps_column_view{input_column->view()};
auto keys = maps_view.keys();     // keys   == [ {1,2},   {1,3},      {2,3,4} ];
auto values = maps_view.values(); // values == [ {10,20}, {100, 300}, {2000, 3000, 4000} ];

auto lookup_1 = maps_view.get_values_for( *make_numeric_scalar(1) );
// lookup_1 = [ {10, 100, null} ];

This abstraction should help replace the Java/JNI map_lookup and map_contains kernels, which only handles MAP<STRING, STRING>.

@mythrocks mythrocks requested a review from a team as a code owner March 2, 2022 00:39
@mythrocks mythrocks self-assigned this Mar 2, 2022
@mythrocks mythrocks requested a review from a team as a code owner March 2, 2022 00:39
@mythrocks mythrocks requested review from trxcllnt and ttnghia March 2, 2022 00:39
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Mar 2, 2022
@mythrocks mythrocks marked this pull request as draft March 2, 2022 00:40
@mythrocks
Copy link
Contributor Author

This PR is a work in progress, and in draft.

@mythrocks mythrocks added feature request New feature or request non-breaking Non-breaking change labels Mar 2, 2022
@mythrocks mythrocks changed the title [FEA] Implement a maps_column_view abstraction over LIST<STRUCT<K,V>> Implement a maps_column_view abstraction over LIST<STRUCT<K,V>> Mar 2, 2022
Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that a map type doesn't exist in libcudf, I don't think this code should live in the libcudf header/source tree. I would expect this to live in the JNI layer or whatever we're calling the libcudf-spark companion library.

@mythrocks
Copy link
Contributor Author

I would expect this to live in the JNI layer or whatever we're calling the libcudf-spark companion library.

I'll move this over to the java/src/native directory before moving out of draft.

@github-actions github-actions bot added the Java Affects Java cuDF API. label Mar 3, 2022
@github-actions github-actions bot added the conda label Mar 3, 2022
Copy link
Contributor

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the minor comment about min() vs. max(), lgtm from the Java perspective.

Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving ops-codeowner file changes

@mythrocks mythrocks requested a review from ttnghia March 14, 2022 18:18
@mythrocks
Copy link
Contributor Author

Thanks for the reviews, all.

I'll merge this once I have rerun tests.

@mythrocks
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 228cc79 into rapidsai:branch-22.04 Mar 14, 2022
@mythrocks
Copy link
Contributor Author

Thank you for the reviews, all. This change was just merged.

mythrocks added a commit to mythrocks/cudf that referenced this pull request Jul 7, 2022
This commit removes unused code in map_lookup.cu and map_lookup.hpp.
rapidsai#10380 replaced the functionality provided in map_lookup.*, and expanded
map-lookup support to all value types, and string/integral key types.
The old map_lookup kernels were retained "just in case".

Removing the old kernels should reduce confusion, and avoid clutter.
rapids-bot bot pushed a commit that referenced this pull request Jul 8, 2022
This commit removes unused code in `map_lookup.cu` and `map_lookup.hpp`.
#10380 replaced the functionality provided in `map_lookup.*`, and expanded
map-lookup support to all value types, and string/integral key types.
The old map_lookup kernels were retained "just in case".

Removing the old kernels should reduce confusion, and avoid clutter.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Robert (Bobby) Evans (https://github.com/revans2)

URL: #11221
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] JNI support get map value for all value types and for all basic key types
5 participants